bag of words

Terms from Artificial Intelligence: humans at the heart of algorithms

A document consists of structured text: sentances, paragaphs, and meaningful phrases. Bag of words techniques ignore this structure reducing the document to the set of words with a frequency count for each. This is used for similarity metrics such as the Jaccard similarity and cosine similarity.

Used on page 208